XML Format Guidelines for the TUNA Corpus
نویسندگان
چکیده
This document forms part of the 2008 distribution of the TUNA Corpus, Version 1.0. This is the first public release of the complete TUNA Corpus of Referring Expressions. A subset of the corpus was used in the first Shared Task and Evaluation Challenge for NLG, the Attribute Selection for the Generation of Referring Expressions Challenge (ASGRE), co-located with the Workshop on Using Corpora in NLG. A subset is also being used for the second edition of the Challenge (the REG Challenge 2008), to be held in Ohio in June 2008, co-located with the International Conference on NLG. Both of these previous releases consist exclusively of the singular referring expressions in the TUNA corpus; moreover, the annotation for both ASGRE 2007 and REG 2008 has a different format which was specifically designed for the tasks involved. This release contains the final version of the TUNA annotation, and includes the full corpus, that is, both singular and plural descriptions.
منابع مشابه
Designing the Latvian Speech Recognition Corpus
In this paper the authors present the first Latvian speech corpus designed specifically for speech recognition purposes. The paper outlines the decisions made in the corpus designing process through analysis of related work on speech corpora creation for different languages. The authors provide also guidelines that were used for the creation of the Latvian speech recognition corpus. The corpus ...
متن کاملCost-based attribute selection for GRE
In this paper we discuss several approaches to the problem of content determination for the generation of referring expressions (GRE) using the Graphbased framework of Krahmer et al. (2003). This work was carried out in the context of the First NLG Shared Task and Evaluation Challenge on Attribute Selection for Referring Expression Generation. In the shared task proper of the Challenge the outp...
متن کاملCost-based attribute selection for GRE (GRAPH-SC/GRAPH-FP)
In this paper we discuss several approaches to the problem of content determination for the generation of referring expressions (GRE) using the Graphbased framework of Krahmer et al. (2003). This work was carried out in the context of the First NLG Shared Task and Evaluation Challenge on Attribute Selection for Referring Expression Generation. In the shared task proper of the Challenge the outp...
متن کاملProcessing XML Text with Python and ElementTree a Practical Experience
In this paper, we evaluate the use of XML format as an internal format for storing texts in linguistic corpora, and describe our experience in using the ElementTree Python XML parser in the Slovak National Corpus.
متن کاملConstraints for corpora development and validation
In this paper we consider corpora as a set of XML documents. The guidelines for the creation of the corpora determine the semantics of the data, stored in them. Usually the guidelines prescribe the actual structure of the corpora, the used symbols, their meaning and the relations among them. Ideally, the software supporting the creation of a corpus has to allow all the constraints that follow f...
متن کامل